Concept-Based Automatic Amharic Document Categorization

نویسندگان

  • Meron Sahlemariam
  • Mulugeta Libsie
  • Daniel Yacob
چکیده

Along with the continuously growing volume of information resources, there is a growing interest toward better solutions for finding, filtering and organizing these resources. Automatic text categorization can play an important role in a wide variety of flexible, dynamic, and personalized information management tasks. The aim of this research work is to make use of concepts as a way of improving the categorization process for Amharic 1 documents. In recent years, ontology-based document categorization method is introduced to solve the problem of document classification. Previous works on keyword-based document categorization miss some important issues of considering semantic relationships between words. In order to resolve the existing problems, this research proposed a framework that automatically categorizes Amharic documents into predefined categories using concepts. The research shows that the use of concepts for an Amharic document categorizer results in 92.9% accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Linguistic Techniques to Improve the Performance of Automatic Text Categorization

This paper presents a method for incorporating natural language processing into existing text categorization procedures. Three aspects are considered in the investigation: (i) a method for weighting terms based on the concept of a probability weighted amount of information, (ii) estimation of term occurrence probabilities using a probabilistic language model, and (iii) automatic extraction of t...

متن کامل

Automatic Text Categorization and Its Application to Text Retrieval

ÐWe develop an automatic text categorization approach and investigate its application to text retrieval. The categorization approach is derived from a combination of a learning paradigm known as instance-based learning and an advanced document retrieval technique known as retrieval feedback. We demonstrate the effectiveness of our categorization approach using two realworld document collections...

متن کامل

Automatic speech recognition for an under-resourced language - amharic

In this paper we present the development of an Automatic Speech Recognition System (ASRS) for Amharic using limited available resources and the freely available speech toolkit (HTK). There are phonological, dialectal, orthographic and morphological features of Amharic that challenge the development of ASRSs. The problem of resource scarcity is also a hindrance to the research and development in...

متن کامل

Dictionary Based Amharic-arabic Cross Language Information Retrieval

The demand for multilingual information is becoming perceptive as the users of the internet throughout the world are escalating and it creates a problem of retrieving documents in one language by specifying query in another language. This increasing demand can be addressed by designing automatic tools, which accepts the query in one language and retrieves the relevant documents in other languag...

متن کامل

Can Automatic Personal Categorization deal with User Inconsistency?

Document categorization is a daily task in every organization, but it is a very subjective process. While automatic document categorization has been widely studied, much challenging research still remains, to support user subjective categorization. This study evaluates and compares the application of Self-Organizing Maps (SOM) and Learning Vector Quantization (LVQ) to automatic document classif...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009